AITopics | pose refinement

Collaborating Authors

pose refinement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3D Gaussian Splatting based Scene-independent Relocalization with Unidirectional and Bidirectional Feature Fusion

Neural Information Processing SystemsJun-9-2026, 14:38:52 GMT

Visual localization is a critical component across various domains. The recent emergence of novel scene representations, such as 3D Gaussian Splatting (3D GS), introduces new opportunities for advancing localization pipelines. In this paper, we propose a novel 3D GS-based framework for RGB based, scene-independent camera relocalization, with three main contributions.

artificial intelligence, name change, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.42)

Add feedback

SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation

Xu, Beining, Zhu, Siting, Wang, Hesheng

arXiv.org Artificial IntelligenceJul-17-2025

We propose SGLoc, a novel localization system that directly regresses camera poses from 3D Gaussian Splatting (3DGS) representation by leveraging semantic information. Our method utilizes the semantic relationship between 2D image and 3D scene representation to estimate the 6DoF pose without prior pose information. In this system, we introduce a multi-level pose regression strategy that progressively estimates and refines the pose of query image from the global 3DGS map, without requiring initial pose priors. Moreover, we introduce a semantic-based global retrieval algorithm that establishes correspondences between 2D (image) and 3D (3DGS map). By matching the extracted scene semantic descriptors of 2D query image and 3DGS semantic representation, we align the image with the local region of the global 3DGS map, thereby obtaining a coarse pose estimation. Subsequently, we refine the coarse pose by iteratively optimizing the difference between the query image and the rendered image from 3DGS. Our SGLoc demonstrates superior performance over baselines on 12scenes and 7scenes datasets, showing excellent capabilities in global localization without initial pose prior. Code will be available at https://github.com/IRMVLab/SGLoc.

artificial intelligence, natural language, query image, (15 more...)

arXiv.org Artificial Intelligence

2507.12027

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.63)

Add feedback

GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity

Ikeda, Takuya, Zakharov, Sergey, Irshad, Muhammad Zubair, Opra, Istvan Balazs, Iwase, Shun, Chen, Dian, Tjersland, Mark, Lee, Robert, Dilly, Alexandre, Ambrus, Rares, Nishiwaki, Koichi

arXiv.org Artificial IntelligenceMay-20-2025

We present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video. Existing methods, while achieving impressive results, often struggle with complex objects, particularly those exhibiting symmetry, intricate geometry or complex appearance. To bridge these gaps, we introduce an adaptive method that combines 3D Gaussian Splatting, hybrid geometry/appearance tracking, and key frame selection to achieve robust tracking and accurate reconstructions across a diverse range of objects. Additionally, we present a benchmark covering these challenging object classes, providing high-quality annotations for evaluating both tracking and reconstruction performance. Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.

artificial intelligence, machine learning, reconstruction, (14 more...)

arXiv.org Artificial Intelligence

2505.11905

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting

Yang, Linqi, Zhao, Xiongwei, Sun, Qihao, Wang, Ke, Chen, Ao, Kang, Peng

arXiv.org Artificial IntelligenceMar-7-2025

6-DoF pose estimation is a fundamental task in computer vision with wide-ranging applications in augmented reality and robotics. Existing single RGB-based methods often compromise accuracy due to their reliance on initial pose estimates and susceptibility to rotational ambiguity, while approaches requiring depth sensors or multi-view setups incur significant deployment costs. To address these limitations, we introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation using only a single RGB image. Central to our approach is the Dual-Attention Ray Scoring Network (DARS-Net), which innovatively decouples positional and angular alignment through geometry-domain attention mechanisms, explicitly modeling directional dependencies to mitigate rotational ambiguity. Additionally, a coarse-to-fine optimization pipeline progressively refines pose estimates by aligning dense 2D features between query images and 3DGS-synthesized views, effectively correcting feature misalignment and depth errors from sparse ray sampling. Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings, rivaling approaches that depend on depth or multi-view images.

artificial intelligence, machine learning, pose estimation, (16 more...)

arXiv.org Artificial Intelligence

2503.05174

Country:

Asia > China > Heilongjiang Province > Harbin (0.05)
Asia > China > Henan Province > Zhengzhou (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection

Engelbracht, Tim, Zurbrügg, René, Pollefeys, Marc, Blum, Hermann, Bauer, Zuria

arXiv.org Artificial IntelligenceSep-18-2024

Despite increasing research efforts on household robotics, robots intended for deployment in domestic settings still struggle with more complex tasks such as interacting with functional elements like drawers or light switches, largely due to limited task-specific understanding and interaction capabilities. These tasks require not only detection and pose estimation but also an understanding of the affordances these elements provide. To address these challenges and enhance robotic scene understanding, we introduce SpotLight: A comprehensive framework for robotic interaction with functional elements, specifically light switches. Furthermore, this framework enables robots to improve their environmental understanding through interaction. Leveraging VLM-based affordance prediction to estimate motion primitives for light switch interaction, we achieve up to 84% operation success in real world experiments. We further introduce a specialized dataset containing 715 images as well as a custom detection model for light switch detection. We demonstrate how the framework can facilitate robot learning through physical interaction by having the robot explore the environment and discover previously unknown relationships in a scene graph representation. Lastly, we propose an extension to the framework to accommodate other functional interactions such as swing doors, showcasing its flexibility. Videos and Code: timengelbracht.github.io/SpotLight/

functional element, interaction, light switch, (15 more...)

arXiv.org Artificial Intelligence

2409.1187

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications

Hillemann, Markus, Langendörfer, Robert, Heiken, Max, Mehltretter, Max, Schenk, Andreas, Weinmann, Martin, Hinz, Stefan, Heipke, Christian, Ulrich, Markus

arXiv.org Artificial IntelligenceMay-7-2024

Neural Radiance Fields (NeRFs) have become a rapidly growing research field with the potential to revolutionize typical photogrammetric workflows, such as those used for 3D scene reconstruction. As input, NeRFs require multi-view images with corresponding camera poses as well as the interior orientation. In the typical NeRF workflow, the camera poses and the interior orientation are estimated in advance with Structure from Motion (SfM). But the quality of the resulting novel views, which depends on different parameters such as the number and distribution of available images, as well as the accuracy of the related camera poses and interior orientation, is difficult to predict. In addition, SfM is a time-consuming pre-processing step, and its quality strongly depends on the image content. Furthermore, the undefined scaling factor of SfM hinders subsequent steps in which metric information is required. In this paper, we evaluate the potential of NeRFs for industrial robot applications. We propose an alternative to SfM pre-processing: we capture the input images with a calibrated camera that is attached to the end effector of an industrial robot and determine accurate camera poses with metric scale based on the robot kinematics. We then investigate the quality of the novel views by comparing them to ground truth, and by computing an internal quality measure based on ensemble methods. For evaluation purposes, we acquire multiple datasets that pose challenges for reconstruction typical of industrial applications, like reflective objects, poor texture, and fine structures. We show that the robot-based pose determination reaches similar accuracy as SfM in non-demanding cases, while having clear advantages in more challenging scenarios. Finally, we present first results of applying the ensemble method to estimate the quality of the synthetic novel view in the absence of a ground truth.

accuracy, application, camera pose, (15 more...)

arXiv.org Artificial Intelligence

2405.04345

Country:

Europe > Germany > Lower Saxony > Hanover (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Grasping, Part Identification, and Pose Refinement in One Shot with a Tactile Gripper

Lim, Joyce Xin-Yan, Pham, Quang-Cuong

arXiv.org Artificial IntelligenceDec-29-2023

The rise in additive manufacturing comes with unique opportunities and challenges. Rapid changes to part design and massive part customization distinctive to 3D-Print (3DP) can be easily achieved. Customized parts that are unique, yet exhibit similar features such as dental moulds, shoe insoles, or engine vanes could be industrially manufactured with 3DP. However, the opportunity for massive part customization comes with unique challenges for the existing production paradigm of robotics applications, as the current robotics paradigm for part identification and pose refinement is repetitive, where data-driven and object-dependent approaches are often used. Thus, a bottleneck exists in robotics applications for 3DP parts where massive customization is involved, as it is difficult for feature-based deep learning approaches to distinguish between similar parts such as shoe insoles belonging to different people. As such, we propose a method that augments patterns on 3DP parts so that grasping, part identification, and pose refinement can be executed in one shot with a tactile gripper. We also experimentally evaluate our approach from three perspectives, including real insertion tasks that mimic robotic sorting and packing, and achieved excellent classification results, a high insertion success rate of 95%, and a sub-millimeter pose refinement accuracy.

library, pose refinement, tactile sensor, (13 more...)

arXiv.org Artificial Intelligence

2312.1765

Country:

Asia > Singapore (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

KRF: Keypoint Refinement with Fusion Network for 6D Pose Estimation

Zhan, Irvin Haozhe, Han, Yiheng, Wang, Yu-Ping, Zeng, Long, Liu, Yong-Jin

arXiv.org Artificial IntelligenceOct-7-2022

Existing refinement methods gradually lose their ability to further improve pose estimation methods' accuracy. In this paper, we propose a new refinement pipeline, Keypoint Refinement with Fusion Network (KRF), for 6D pose estimation, especially for objects with serious occlusion. The pipeline consists of two steps. It first completes the input point clouds via a novel point completion network. The network uses both local and global features, considering the pose information during point completion. Then, it registers the completed object point cloud with corresponding target point cloud by Color supported Iterative KeyPoint (CIKP). The CIKP method introduces color information into registration and registers point cloud around each keypoint to increase stability. The KRF pipeline can be integrated with existing popular 6D pose estimation methods, e.g. the full flow bidirectional fusion network, to further improved their pose estimation accuracy. Experiments show that our method outperforms the state-of-the-art method from 93.9\% to 94.4\% on YCB-Video dataset and from 64.4\% to 66.8\% on Occlusion LineMOD dataset. Our source code is available at https://github.com/zhanhz/KRF.

artificial intelligence, machine learning, point cloud, (15 more...)

arXiv.org Artificial Intelligence

2210.03437

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-path Learning for Object Pose Estimation Across Domains

Sundermeyer, Martin, Durner, Maximilian, Puang, En Yen, Marton, Zoltan-Csaba, Triebel, Rudolph

arXiv.org Machine LearningJul-31-2019

We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe the orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not need to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization capabilities and iterative refinement strategies on the ModelNet40 and T-LESS dataset. On T-LESS, we achieve state-of-the-art results with our 6D Object Detection pipeline, both in the RGB and depth domain, outperforming learning-free pipelines at much lower runtimes.

artificial intelligence, machine learning, pose estimation, (17 more...)

arXiv.org Machine Learning

1908.00151

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.64)

Add feedback